68 research outputs found
On the difference-to-sum power ratio of speech and wind noise based on the Corcos model
The difference-to-sum power ratio was proposed and used to suppress wind
noise under specific acoustic conditions. In this contribution, a general
formulation of the difference-to-sum power ratio associated with a mixture of
speech and wind noise is proposed and analyzed. In particular, it is assumed
that the complex coherence of convective turbulence can be modelled by the
Corcos model. In contrast to the work in which the power ratio was first
presented, the employed Corcos model holds for every possible air stream
direction and takes into account the lateral coherence decay rate. The obtained
expression is subsequently validated with real data for a dual microphone
set-up. Finally, the difference-to- sum power ratio is exploited as a spatial
feature to indicate the frame-wise presence of wind noise, obtaining improved
detection performance when compared to an existing multi-channel wind noise
detection approach.Comment: 5 pages, 3 figures, IEEE-ICSEE Eilat-Israel conference (special
session
Broadband DOA estimation using Convolutional neural networks trained with noise signals
A convolution neural network (CNN) based classification method for broadband
DOA estimation is proposed, where the phase component of the short-time Fourier
transform coefficients of the received microphone signals are directly fed into
the CNN and the features required for DOA estimation are learnt during
training. Since only the phase component of the input is used, the CNN can be
trained with synthesized noise signals, thereby making the preparation of the
training data set easier compared to using speech signals. Through experimental
evaluation, the ability of the proposed noise trained CNN framework to
generalize to speech sources is demonstrated. In addition, the robustness of
the system to noise, small perturbations in microphone positions, as well as
its ability to adapt to different acoustic conditions is investigated using
experiments with simulated and real data.Comment: Published in Proceedings of IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA) 201
Online DOA estimation using real eigenbeam ESPRIT with propagation vector matching
International audienceThe Eigenbeam estimation of signal parameters via rotational invariance technique (EB-ESPRIT) [1] is a method to estimate multiple directions-of-arrival (DOAs) of sound sources from a spherical microphone array recording in the spherical harmonics domain (SHD). The method, first, constructs a signal subspace from the SHD signal and then makes use of the fact that, for plane-wave sources, the signal subspace is spanned by the (complex conjugate) spherical harmonic vectors at the source directions. The DOAs are then estimated from the signal subspace using recurrence relations of spherical harmonics.In recent publications, the singularity and ambiguity problems of the original EB-ESPRIT have been solved by jointly combining several types of recurrence relations. The state-of-the-art EB-ESPRIT, denoted as DOA-vector EB-ESPRIT, is based on three recurrence relations [2,3]. This EB-ESPRIT variant can estimate the source DOAs with significantly higher accuracy compared to the other EB-ESPRIT variants [3]. However, a permutation problem arises, which can be solved by using, for example, a joint diagonalization method [3].For parametric spatial audio signal processing purposes in the short-time Fourier transform (STFT) domain, DOA estimates are usually needed per time-frame and frequency bin. In principle, one can use the DOA-vector EB-ESPRIT method to estimate the source DOAs per time-frequency bin in an online manner. However, due to the eigendecompostion of the PSD matrix and the joint diagonalization procedure, the computational cost might be too large for many real-time applications.In this work, we propose a computationally more efficient version of the DOA-vector EB-ESPRIT based on real spherical harmonics recurrence relations. First, we separate the real and imaginary parts of the real SHD signal in the STFT domain and then construct a real signal subspace thereof, which can be recursively estimated using the deflated projection approximation subspace tracking (PASTd) [4] method. For the case of one source per time-frequency bin, the joint diagonalization is not necessary and we can simplify the EB-ESPRIT equations. For the case of two sources, the plane-wave propagation vectors can directly be estimated from the signal subspace eigenvectors by employing properties of the propagation vectors. This method can be seen as a higher order ambisonics extension of the robust B-format DOA estimation in [5]. The proposed method for estimating two DOAs can be summarized as follows:1. Separate real and imaginary parts of the real SHD signal in the STFT domain.2. Recursively estimate the signal subspace eigenvectors using PASTd.3. Estimate the two plane-wave propagation vectors from the signal subspace eigenvectors by using that they span the same subspace and by using properties of the propagation vectors (subspace-propagation vector matching).4. Estimate the DOAs by using three types of real spherical harmonics recurrence relations.Alternatively, one can estimate the DOAs analogously to the complex DOA-vector EB-ESPRIT using the joint diagonalization method proposed in [3].For the evaluation, we simulate SHD signals up to third order with one and two speech sources in reverberant and noisy environments. For the one-source scenarios, we compare the real DOA-vector EB-ESPRIT with subspace estimation based on singular value decomposition (SVD) against PASTd. For the two-source scenarios, we compare the real DOA-vector EB-ESPRIT with joint diagonalization against subspace-propagation vector matching and the robust B-format DOA estimation method.We analyze the angular distributions of the DOA estimates and find, that the DOA estimation using PASTd for the signal subspace estimation is slightly less accurate than the SVD based method but computationally much more efficient. For the estimation of two DOAs, the EB-ESPRIT based methods outperform the robust B-format estimation method when higher SHD orders are considered. The joint diagonalization method is more accurate than the subspace-propagation vector matching method. However, the latter is computationally more efficient.References:[1] H. Teutsch and W. Kellermann, âDetection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures,â in Proc. IEEE Intl. Conf. Acoust., Speech Signal Proc. (ICASSP), Mar. 2008, pp. 5276â5279.[2] B. Jo and J. W. Choi, âNonsingular EB-ESPRIT for the localization of early reflections in a room,â J. Acoust. Soc. Am., vol. 144, no. 3, p. 1882, Sep. 2018.[3] A. Herzog and E. A. P. Habets, âEigenbeam-ESPRIT for DOA-vector estimation,â IEEE Signal Process. Lett., vol. 26, no. 4, pp. 572-576, April 2019.[4] B. Yang â âProjection Approximation Subspace Tracking, IEEE Trans. Sig. Proc.,â vol. 43, no. 1, Jan. 1995.[5] O. Thiergart and E.A.P. Habets, âRobust direction-of-arrival estimation of two simultaneous plane waves from a B-format signal,â IEEE 27th Conv. of Electrical and Electronics Engineers in Israel, Nov. 2012
Simulating Multi-channel Wind Noise Based on the Corcos Model
A novel multi-channel artificial wind noise generator based on a fluid
dynamics model, namely the Corcos model, is proposed. In particular, the model
is used to approximate the complex coherence function of wind noise signals
measured with closely-spaced microphones in the free-field and for
time-invariant wind stream direction and speed. Preliminary experiments focus
on a spatial analysis of recorded wind noise signals and the validation of the
Corcos model for diverse measurement set-ups. Subsequently, the Corcos model is
used to synthetically generate wind noise signals exhibiting the desired
complex coherence. The multi-channel generator is designed extending an
existing single-channel generator to create N mutually uncorrelated signals,
while the predefined complex coherence function is obtained exploiting an
algorithm developed to generate multi-channel non-stationary noise signals
under a complex coherence constraint. Temporal, spectral and spatial
characteristics of synthetic signals match with those observed in measured wind
noise. The artificial generation overcomes the time-consuming challenge of
collecting pure wind noise samples for noise reduction evaluations and provides
flexibility in the number of generated signals used in the simulations.Comment: 5 pages, 2 figures, IWAENC 201
Multi-scale aggregation of phase information for reducing computational cost of CNN based DOA estimation
In a recent work on direction-of-arrival (DOA) estimation of multiple
speakers with convolutional neural networks (CNNs), the phase component of
short-time Fourier transform (STFT) coefficients of the microphone signal is
given as input and small filters are used to learn the phase relations between
neighboring microphones. Due to this chosen filter size, convolution
layers are required to achieve the best performance for a microphone array with
M microphones. For arrays with large number of microphones, this requirement
leads to a high computational cost making the method practically infeasible. In
this work, we propose to use systematic dilations of the convolution filters in
each of the convolution layers of the previously proposed CNN for expansion of
the receptive field of the filters to reduce the computational cost of the
method. Different strategies for expansion of the receptive field of the
filters for a specific microphone array are explored. With experimental
analysis of the different strategies, it is shown that an aggressive expansion
strategy results in a considerable reduction in computational cost while a
relatively gradual expansion of the receptive field exhibits the best DOA
estimation performance along with reduction in the computational cost.Comment: arXiv admin note: text overlap with arXiv:1807.1172
Modal Decomposition of Feedback Delay Networks
Feedback delay networks (FDNs) belong to a general class of recursive filters
which are widely used in sound synthesis and physical modeling applications. We
present a numerical technique to compute the modal decomposition of the FDN
transfer function. The proposed pole finding algorithm is based on the
Ehrlich-Aberth iteration for matrix polynomials and has improved computational
performance of up to three orders of magnitude compared to a scalar polynomial
root finder. We demonstrate how explicit knowledge of the FDN's modal behavior
facilitates analysis and improvements for artificial reverberation. The
statistical distribution of mode frequency and residue magnitudes demonstrate
that relatively few modes contribute a large portion of impulse response
energy
- âŠ